FlexIO: Location-flexible Execution of In Situ Data Analytics for Large Scale Scientific Applications
نویسندگان
چکیده
Increasingly severe I/O bottlenecks on High-End Computing machines are prompting scientists to process simulation output data while simulations are running and before placing data on disk – ”in situ” and/or ”in-transit”. There are several options in placing in-situ data analytics along the I/O path: on compute nodes, on staging nodes dedicated to analytics, or after data is stored on persistent storage. Different placements have different impact on end to end performance and cost. The consequence is a need for flexibility in the location of in situ data analytics. The FlexIO facility described in this paper supports flexible placement of in situ analytics, by offering simple abstractions and methods that help developers exploit the opportunities and trade-offs in performing analytics at different levels of the I/O hierarchy. Experimental results with several large-scale scientific applications demonstrate the importance of flexibility in analytics placement. Keywords-I/O, In Situ Processing, Staging, Placement, Data Analytics
منابع مشابه
FlexAnalytics: A Flexible Data Analytics Framework for Big Data Applications with I/O Performance Improvement
a r t i c l e i n f o a b s t r a c t Increasingly larger scale applications are generating an unprecedented amount of data. However, the increasing gap between computation and I/O capacity on High End Computing machines makes a severe bottleneck for data analysis. Instead of moving data from its source to the output storage, in-situ analytics processes output data while simulations are running...
متن کاملLarge-Scale Graph Analytics in Aster 6: Bringing Context to Big Data Discovery
Graph analytics is an important big data discovery technique. Applications include identifying influential employees for retention, detecting fraud in a complex interaction network, and determining product affinities by exploiting community buying patterns. Specialized platforms have emerged to satisfy the unique processing requirements of large-scale graph analytics; however, these platforms d...
متن کاملVisual Analytics for Large Communication Trace Data
Executions of modern parallel programs often yield complex communications among compute nodes of large-scale clusters of workstations or supercomputers. Analyzing communication patterns is becoming increasingly critical to performance optimization. As the scale and complexity of parallel applications drastically increases, visu-alization has become a feasible means to conduct analysis of massiv...
متن کاملApplication of Big Data Analytics in Power Distribution Network
Smart grid enhances optimization in generation, distribution and consumption of the electricity by integrating information and communication technologies into the grid. Today, utilities are moving towards smart grid applications, most common one being deployment of smart meters in advanced metering infrastructure, and the first technical challenge they face is the huge volume of data generated ...
متن کاملThe Need for Resilience Research in Workflows of Big Compute and Big Data Scientific Applications
Projections and reports about exascale failure modes conclude that we need to protect numerical simulations and data analytics from an increasing risk of hardware and software failures and silent data corruptions (SDC) [1, 4]. At this scale, hardware and software failures could be as frequent as ten or more per day. According to [9], the semiconductor industry will have increased difficulty pre...
متن کامل